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ABSTRACT 

Aims. This paper is the second in a series, implementing a classification system for Gaia observations of unresolved galaxies. Our 
goals are to determine spectral classes and estimate intrinsic astrophysical parameters via synthetic templates. Here we describe (1) 
a new extended library of synthetic galaxy spectra, (2) its comparison with various observations, and (3) first results of classification 
and parametrization experiments using simulated Gaia spectrophotometry of this library. 

Methods. Using the PEGASE.2 code, based on galaxy evolution models that take account of metallicity evolution, extinction correc- 
tion, and emission lines (with stellar spectra based on the BaSeL library), we improved our first library and extended it to cover the 
domain of most of the SDSS catalogue. Our classification and regression models were Support Vector Machines (SVMs). 
Results. We produce an extended library of 28 885 synthetic galaxy spectra at zero redshift covering four general Hubble types of 
galaxies, over the wavelength range between 250 and 1050 nm at a sampling of 1 nm or less. The library is also produced for 4 random 
values of redshift in the range of 0-0.2. It is computed on a random grid of four key astrophysical parameters (infall timescale and 3 
parameters defining the SFR) and, depending on the galaxy type, on two values of the age of the galaxy. The synthetic library was 
compared and found to be in good agreement with various observations. The first results from the SVM classifiers and parametrizers 
are promising, indicating that Hubble types can be reliably predicted and several parameters estimated with low bias and variance. 

Key words. - Galaxies: fundamental parameters - Techniques: photometric - Techniques: spectroscopic 
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Fig. 1. The first library of synthetic galaxy spectra. The SDSS 
galaxies, the galaxies produced in the first library, and the typical 
synthetic spectra of PEGASE.2 are presented with black, red, 
and yellow dots, respectively. 



Fig. 2. Models of Im galaxies with SFR stopping at IMyr to 
2Gyr ago (magenta). Black dots are the SDSS galaxies and red 
the 8 typical synthetic spectra of PEGASE.2. 



spectra corresponding to diff'erent typical types of Hubble se- 
quence galaxies (E, SO, Sa, Sb, Sbc, Sc, Sd, and Im) were pro- 
duced using PEGASE.2 (Fioc 1997; Fioc & Rocca-Volmerange 
11999 1 Le Borgne & Rocca-Volmerange, 2002). By expanding the 
range of the input parameters values of these eight typical mod- 
els and applying selection criteria for each type, we produced 
our first library of synthetic galaxy spectra (Tsalmantza et al. 
2007). This library consists of 888 spectra produced on a regular 
grid of input parameters values and 2709 spectra produced on a 
random grid. These spectra correspond to seven spectral types 
of galaxies: E-SO, Sa, Sb, Sbc, Sc, Sd, and Im. For only E-SO 
galaxies did we model galactic winds, as was the case for the 
original PEGASE.2 models. The part of the library constructed 
by the regular grid was also produced for 4 values of redshift: 
0.05, 0.1, 0.15, and 0.2. 

This first library of synthetic galaxy spectra at zero redshift 
was compared with the SDSS data (DR4) of galaxies. Although 
the photometry produced by the synthetic spectra was in very 
good agreement with the observational data, only a narrow locus 
of the SDSS colour-colour diagram was covered (Fig.[T]). For the 
classification and parametrization tasks of Gaia, the production 
of a large variety of galaxies is mandatory, to interpret all obser- 
vational data. To accomplish this, we attempted to cover most of 
the SDSS colour-colour diagram in the second library presented 
here. 

For the extension of the first library of synthetic spectra of 
galaxies, we had to overcome two main problems when compar- 
ing with SDSS data (Fig.[T]): i) the spread in the blue part of the 
diagram, where true data have a large variance, while all the syn- 
thetic irregular galaxies are distributed along a line; and ii) the 
systematic deviation between synthetic and true data in the red 
part of the diagram, where early-type galaxies are located. 

In Sect. 2, where we describe our method to produce our 
second library of synthetic galaxy spectra, we develop and apply 
solutions to these two problems (Sect. 2.1 and 2.2). In Sect. 3, 
we check our library in other colours and in Sect. 4 we present 
our final library produced at a random grid of physical parame- 
ters. In sect. 5 we compare our library with the Kennicutt Atlas 
of galaxies. The simulated Gaia spectra for the final library are 
described in Sect. 6, while in Sect. 7 we present the classification 
and parametrization models used and their first results for these 
data. A brief discussion follows in Sect. 8. 



2. The second library of the synthetic spectra of 
galaxies 

2.1. The blue part of the colour-colour diagram-developing 
scenarios for quenched star-forming galaxies 

In the first library of synthetic spectra of galaxies (Fig. [TJ the 
blue part of the SDSS colour-colour diagram (r-i<0.15 mag) 
is covered only by irregular (Im) galaxies. However, starburst 
galaxies could also have such blue colours. In the models of 
PEGASE.2 used to produce starburst galaxies (Le Borgne & 
Rocca-Volmerange. 12002b , the age of the galaxy can vary from 
1 Myr to 2 Gyr, while the SFR is given by a delta function lasting 
for only 1 Myr. To use models with more realistic values of pa- 
rameters, we investigated various scenarios. The one providing 
the most comprehensive coverage of the blue part of the SDSS 
colour-colour diagram was based on models of irregular galaxies 
in which star formation had stopped at a certain time in the past 
instead of continuing until the present. Using the original model 
of Im galaxies and stopping star formation at various ages from 
1 Myr to 2 Gyr before the present (BP), we produced some ex- 
amples of these models. Their synthesized colours are presented 
in Fig. [2] where we see that the spread in the properties of the 
SDSS galaxies in the blue part of the diagram can be covered by 
applying this approach to all the irregular galaxies in our library. 

We are able to reproduce the properties of galaxies with such 
blue colours by assuming that the SF in the irregular models 
stops at a certain time (see last row of Table [TJ where varies 
from 1 Myr to 250 Myr BP in the produced spectra). In this way, 
we can include the bulk of the flux produced by supergiants and 
AGB stars (which have very high masses and evolve rapidly). 
Assuming that the star formation stopped even earlier would cre- 
ate galaxies with too red colours (i.e., redder than the ones corre- 
sponding to the other types of galaxies included in our library). 
In the sections that follow, we refer to the galaxies produced by 
this model as quenched star-forming galaxies. 

This model is clearly more realistic than one with a delta 
function star formation history (SFH). Blue galaxies produced 
in this way have an age of 9 Gyr and not just a few Myr. In 
their quiescent phase, this population of quenched star forma- 
tion resemble periodic bursting dwarfs, the properties of which 
are indeed required to fit the UV galaxy counts (Fioc & Rocca- 
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Fig. 3. Models of E galaxies produced assuming exponential star 
formation rate (magenta). Black dots are SDSS galaxies and red 
the 8 typical synthetic spectra of PEGASE.2. 



Volmerange" 1 999) . A further analysis of the SEDs in the far-UV 
will be required to confirm this result. 

The SFH used to reproduce the properties of quenched star- 
forming galaxies in the new version of our library is given 
in Table [T] Using 10 diff'erent values for ps, we produced 
1584x10=15840 synthetic spectra for quenched star-forming 
galaxies. In Table [21 the range of the input parameters is given, 
while Fig. (H shows our results (magenta points). It is obvious 
that most of the blue part of the SDSS colour-colour diagram is 
now covered. 



put parameters (TableO. The simulated colours are presented in 
Fig. m (red points). In this plot, we can see that there is now a 
closer agreement between the synthetic and observed data than 
possible with the first library. 

2.3. Spiral and Irregular galaxies 

For the case of spiral and irregular galaxies, the scenarios used 
in the first library were adopted here (table [B, while their input 
parameter values are extended to a wider range (Table [21). Gas 
infall was taken into account, while the age of the galaxies was 
kept at 9 Gyr and 13 Gyr for irregular and spiral galaxies, respec- 
tively. In Fig. in the colour-colour diagram for the 9590 model 
spiral galaxies (light blue points) and the 1584 model irregu- 
lar galaxies (blue points) is presented. In this plot, we can see 
that the properties of the new synthetic spectra of spiral galax- 
ies produced here covers a larger part of the SDSS colour-colour 
diagram than the first library (Fig.[T]). 

Table 1. Models of SF assumed in the new library. 



Galaxy type 



SFR 



Early-type galaxies 



P2QXp(-t/pi)/pi 



Spiral galaxies 



Irregular galaxies 



(M^/P2 



(M^as)/P2 



Quenched star-forming galaxies 



(M^L)/P2 for t<tf-p3 
for ^ > ^/ - ps where 
(tf=9 Gyr, the age of the galaxy) 



2.2. The red part of the colour-colour diagram - adopting an 
exponential SFR law for early-type galaxies 

In the first library, we assumed a star-formation rate that is pro- 
portional to the gas mass (SFR=(Mgas) / Pi) and the presence 
of infall and galactic winds to reproduce spectra of early-type 
galaxies. As described in the introduction and as can be seen in 
Fig. [H there is a small deviation between the predicted prop- 
erties of this type of galaxy produced in the first library and 
those observed for red galaxies of SDSS. To solve this problem, 
we tested several methods. The most successful was to increase 
the amplitude of the initial starburst and decrease its duration. 
To achieve this, we adopted an exponential SFR for early-type 
galaxies. Using this scenario, one cannot include infall since the 
presence of infall means that the mass of the gas is zero at t=0. 
Galactic winds are also not included in this model. 

To test this model, we initially used a small set of 
values for the parameters pi and p2 of the exponential 
SFR=(p2Qxp(-t/pi)/pi) and created models using all the com- 
binations of the following parameters values: 

pi: 50, 100, 250, 500, 1000, 1500, 2000 & 2500 Myr 

P2: 0.5, 0.6, 0.65, 0.7, 0.75 & 1 

The synthesized SDSS photometry of these models is shown 
in Fig. [3l By increasing p2, we produce galaxies with redder 
colours while ensuring that the influence of /?i is weaker (since it 
appears both in the order of the exponential and in the denomina- 
tor of the SFR). In Fig.O it is clear that the difl'erences between 
the red parts of the spectrum of synthetic and true galaxies have 
decreased compared to the first library. For this reason we de- 
cided to assume this scenario (Table [B when modelling the new 
library of synthetic galaxy spectra of E and SO galaxies. Initially, 
we produced 2015 synthetic spectra based on a regular grid of in- 



Table 2. Input parameters for the galaxy scenarios in the new 
library. 



parameter 


range of value 


Early-type galaxies 


Pi 


10-30 000 (Myr) 


Pi 


0.2-1.5 (Mo) 


age 


13 (Gyr) 


Spiral galaxies 


Pi 


0.3-2.4 


Pi 


5-30 000 (Myr/Mo) 


infall timescale 


5-16 000 (Myr) 


age 


13 (Gyr) 


Irregular galaxies 


Pi 


0.6-3.9 


Pi 


4000-70 000 (Myr/Mo) 


infall timescale 


5000-30 000 (Myr) 


age 


9 (Gyr) 


Quenched star-forming galaxies 


Pi 


0.6-3.9 


Pi 


4000-70 000 (Myr/Mo) 


P3 


1-250 (Myr) 


infall timescale 


5000-30 000 (Myr) 


age 


9 (Gyr) 



Synthesized colours of all 29 029 synthetic spectra in the new 
library are superimposed on the SDSS observed data in Fig. IH 
The coverage is much improved, although a problem remains: 
the boundaries of the areas corresponding to each of the 4 sce- 
narios are not clear. To solve this problem, we used UBV pho- 
tometry as we describe in the next section. 
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Fig. 4. Models of irregular (blue), quenched star-forming galax- 
ies (magenta), spirals (light blue), and early-type galaxies (red). 
Black dots are SDSS galaxies and green the 8 typical synthetic 
spectra of PEGASE.2. 



3. Criteria describing the spectral type based on 
UBV photometry 

It is known that the U-B versus B-V colour-colour diagram can 
provide a means to spectrally classifying of galaxies. To define 
the parameter- space corresponding to each of the 4 scenarios de- 
fined in Sect. 2.3, we compare our synthetic UBV colours with 
observational values for galaxies of known spectral type. 

The observational data used here were taken from the LEDA 
catalog (Paturel et al., 119971) . This catalog contains 2672 galax- 
ies with estimated numeric photometric type (T) corresponding 
to the RC3 catalog and calculated total apparent U-B and B-V 
colours. Those colours are corrected for Galactic extinction, in- 
clination, and redshift. In Figs. [5] -[H we present the U-B versus 
B-V colour-colour diagram for all the synthetic spectra produced 
here (black dots) plotted over the galaxies of the LEDA catalog. 
According to this catalog, galaxies with T < 0.5 are considered 
to be early-type galaxies, with 0.5 < T < 8.5 spirals and with 
r > 8.5 irregulars. These type of galaxies are presented in Figs. 
[5] - m The spread of the real data in the colour-colour diagram 
is larger than the one of the synthetic spectra, most probably be- 
cause of errors in the calculations of UBV colours in the LEDA 
catalog and limitations of the PEGASE model. 

The synthetic and observational data are in good agreement, 
even though a small diff'erence in the slope is observed. This 
could be explained by considering that the U-B colour is the one 
with the largest errors in the simulated photometry (Yi, 2003 ). 
For that reason, we based our selection criteria only on the B-V 
colour. In Figs. [9]and[T0l we present the normalized to the to- 
tal number of galaxies B-V distributions for the observational 
LEDA catalog and synthetic spectra, respectively, for the three 
types of galaxies. In Fig.O we included quenched star-forming 
galaxies (green) even though they are not included in the LEDA 
catalog. Since this type of galaxy is produced by models of irreg- 
ular galaxies we exclude some of them because of the selection 
criteria applied to the irregular galaxies. 

Comparing these two histograms, we can see that for each 
type of galaxy the peaks of the histograms are approximately 
coincident for both observational and synthetic spectra. Using 
the distribution of the observational data, we decided to keep in 

^ http://leda.univ-lyonl.fr/ 
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Fig. 5. Early-type, spiral, and irregular (red, green, and blue) 
galaxies derived from the LEDA catalog. Black dots are all 
galaxies (including quenched star-forming galaxies) produced 
by PEGASE.2. 
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Fig. 6. As in Fig. \5\ but now the black dots are the properties of 
irregular galaxies produced by PEGASE.2. 

our library early-type galaxies with B - V > 0.6, spirals with 
03 < B - V < 0.9, and irregulars with B -V < 0.6. In all 
cases, more than 90% of real galaxies of each type remain af- 
ter applying the above selection criteria. In the case of synthetic 
spectra, these criteria aff'ect mainly spiral and early-type galaxies 
where the number of galaxies is reduced after their application. 
However, the range of parameters remains the same in all cases 
and only some combinations are removed from our original sam- 
ple. 

4. The random library of synthetic galaxy spectra 

The library of synthetic spectra of galaxies described in Sect. 
2 was produced by using a regular grid of input paramaters of 
PEGASE. To achieve optimal training and assessment of the 
classification and parametrization algorithms, one should use 
data produced by a random grid of parameters. For that rea- 
son, we used the range of parameters given in Table [2] to com- 
pile 30 500 random scenarios of galaxies. To create spectra using 
PEGASE.2, we used GRID-technology provided by SEE-GRID 
(South Eastern Europe). We applied the B-V criteria described 
in the previous section to the resulting spectra. By applying this 
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Fig. 7. As in Fig. [51 but now the black dots are the properties of Fig. 9. Distribution (normalized) of B-V colours for early-type, 
spiral galaxies produced by PEGASE.2. spiral, and irregular (red, black, and blue respectively) galaxies 

derived from the LED A catalog. 
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Fig. 8. As in Fig. \5\ but now the black dots are the properties of 
early-type galaxies produced by PEGASE.2 

procedure, we produced 28 16 early-type galaxies, 10 569 spirals, 
1500 irregulars, and 14 000 quenched star-forming galaxies. The 
derived library of 28 885 synthetic galaxy spectra is presented in 
Fig. [TT] where the colours of those spectra are plotted over the 
SDSS data. The new synthesized colours are in very good agree- 
ment with the SDSS observations. 

Each spectrum in this library was simulated at four random 
values of redshift, each lying within four intervals from to 0.05, 
0.05 to 0.1, 0.1 to 0.15, and 0.15 to 0.2. The final library now 
includes 144425 synthetic spectra produced by a random grid 
of parameters. 

The spectra were linearly interpolated to produce a wave- 
length sampling of Inm or shorter to be used for simulations of 
Gaia observations. 

5. Comparison of the second library with 
Kennicutt's atlas 

Even though the second library of synthetic spectra has been 
compared and found to be in good agreement with both SDSS 
and LEDA photometric observations, we also used the library 
to fit real spectra of galaxies in order to examine in more detail 
its ability to represent reality and to classify galaxies accord- 
ing to their Hubble type. We first compared the library with a 
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Fig. 10. Distribution (normalized) of B-V colours for early-type, 
spiral, irregular and quenched star-forming (red, black, blue and 
green respectively) galaxies produced by PEGASE.2 code. 




-0.2 0.2 0.4 0.6 

r_L Cmag) 



Fig. 11. Random models of irregular (blue), quenched star- 
forming galaxies (magenta), spirals (light blue), and early-type 
galaxies (red). Black dots are SDSS galaxies and green the 8 
typical synthetic spectra of PEGASE.2. 
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small amount of observational spectra of high S/N and known 
Hubble type. A catalog meeting these criteria is Kennicutt's atlas 
of galaxies (Kennicutt, 1992). This is a spectophotometric atlas 
containing spectra of 55 nearby normal and peculiar galaxies, 
most covering the spectral range of 365-7 10 nm at a resolution 
of 0.2 nm. The spectra were normalized with respect to the flux 
at 550 nm. Since in PEGASE spectra, the flux at 550 nm is not 
provided, our spectra were normalized to the mean flux between 
549 and 551 nm. 

Even though the galaxies observed in this catalog are nearby 
galaxies, we first needed to transform them to the rest frame. 
This was achieved by keeping the energy in each spectral bin 
constant, while relabeling the wavelength axis. After this step, 
we had to rebin the spectra to ensure that their resolution was 
equal to that of PEGASE. 2 (2nm). We also ensured that all the 
observational spectra had a common spectral range, namely 371 
to 679 nm (155 data points). 

To fit these spectra to our new library, we used two diff'er- 
ent methods. In the first method, we did not account for the 
wavelength ranges including the seven most important emission 
lines, while in the second method we did. These seven emission 
lines are the Ha + [Nil] blend (654.8 and 658.3 nm (the second 
line is not included in the PEGASE data)), HyS (486.1 nm), [Oil] 
(372.7 nm), [OIII] (500.7 nm), and [SII] (671.7 and 673.1 nm). 
By excluding the three regions 370-380 nm, 480-5 10 nm, and 
650-680 nm, these lines were in most cases excluded. This left 
us with 123 data points in each spectrum. In the second method, 
we decreased the spectral resolution of both the synthetic and 
observational spectra in the wavelength ranges contatining those 
lines, and replaced the fluxes with broadband values, we then 
had 126 data points in each spectrum. We followed this proce- 
dure because PEGASE predicts only the total energy of each 
emission line and we have no information about the line's shape 
to compare with when fitting the observational spectra. In both 
cases, we performed a;^^-fitting of the Kennicutt galaxy spectra 
with our library and in this way checked the ability of the syn- 
thetic spectra to reproduce and classify the observational ones. 
We should also mention that the spectrophotometric calibration 
of Kennicutt's atlas has an error of 10%, which of course aff'ects 
our results. 

When we excluded the above emission lines from our com- 
parison, 17 of 55 galaxies were classified correctly, while when 
we included them, the ;^^-fitting gave 2 additional correct re- 
sults (19 in total). These results are very good if we bear in mind 
that of the 55 galaxies included in the Kennicutt atlas only 28 
correspond to typical Hubble types. In addition to the 9 misclas- 
sifications that occur, two may be caused by other eff'ects and 
not by problems with our library. More specifically, NGC6217 
is an Sc strongly interacting galaxy and therefore the error in its 
classification as an irregular galaxy is not very important. This 
is also the case for the misclassification of the irregular galaxy 
NGC1569 as an early-type or spiral galaxy (when lines were 
excluded or included, respectively), since its spectrum was af- 
fected by high Galactic reddening. Of the remaining 27 galaxies 
in Kennicutt's atlas that do not correspond to Hubble types, 8 
are starburst galaxies, 4 are extreme emission-line galaxies, 7 are 
Seyfert galaxies, and 4 are peculiar and merger galaxies. These 
galaxy types are not included in our library. However, since the 
quenched star-forming galaxies that we produced cover the same 
part of the SDSS colour-colour diagram as the starburst galax- 
ies (Sect. 2.1), we are interested in fitting the starbust galaxy 
spectra. The Kennicutt's atlas includes four galaxies undergo- 
ing global bursts of star formation and four galaxies that are nu- 



clear starburst galaxies. The results of fitting these galaxy spec- 
tra with those included in our library showed that none of these 
galaxies were classified as quenched star-forming galaxy. This 
could be because our method does not place much importance 
on the emission lines, which are the most significant features in 
the spectra of this galaxy type. However, a closer comparison 
between the model quenched star-forming galaxies included in 
our library and that of observational starbursts showed that the 
emission lines of this type of synthetic spectra are not as strong 
as in the observational spectra. 

The results of the classification are presented in Table [3] for 
when the emission lines are taken into consideration, while the 
fitting of the spectra when excluding and including the emission 
lines is presented in Appendix A. In all cases, the fitting of the 
continua is very good and in most cases it is also good when 
we included in the comparison the emission lines. We note that 
the continua fitting does not deteriorate in most cases when the 
emission lines are included. In most cases, the diff'erence in the 
scaling of the flux axis highlights the details of the continua most 
clearly. 

After deciding the best-fit model spectrum for each galaxy 
we can extract all the other information provided by PEGASE 
for each spectrum, such as SFR, metallicity history, mass of both 
stars and gas, and luminosity of the galaxy. 



Table 3. Classification results for galaxy spectra in the 
Kennicutt's atlas. 



Kennicutt/PEGASE 


Early-type 


Spiral 


Irregular 


Early-type 


4 


4 





Spiral 


1 


12 


3 


Irregular 


1 





3 


Starburst with global burst 





1 


3 


Nucleus starburst 


1 


2 


1 



Note. Rows indicate the true class and columns the ones predicted by 
the fitting with the synthetic spectra of our library. For these results we 
have included the emission lines in the x^-fitting. 



6. Simulated Gaia spectra 

The Gaia spectrophotometer is a slitless prism spectrograph 
comprising blue and red channels (called BP and RP respec- 
tively) that operate over the wavelength ranges 330-680 nm 
and 640-1 050 nm respectively. Each of BP and RP is simulated 
with 48 pixels, whereby the dispersion varies from 3-29 nm/pix 
and 6-15 nm/pix respectively. The 144 425 synthetic spectra of 
galaxies produced here were simulated for BP and RP Gaia spec- 
tra during cycle 3 of Gaia simulations. Additionally they were 
reddened by applying to each of them a random value of ex- 
tinction in our Galaxy (Av in the range 0-10). The simulated 
spectra are given for three values of G-band magnitude (G=15, 
G=18.5, and G=20). Randomly sampled noise, including the 
source Poisson noise, background Poisson noise, and CCD read- 
out noise, was added to all spectra. The final library contains 
1 733 100 simulated Gaia spectra. In the sections that follow, we 
present the results of the classification and parametrization of 
spectra with a noise characteristic of G=18.5 mag. 
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Fig. 12. The mean S/N spectrum of the simulated galaxy spectra 
used in the classification tests. Only points with SNR>3 (hori- 
zontal line) were used. 



Table 5. Galaxy classification with the SVM for the training set. 



Type 


E 


s 


Im 


QSFG 


E 


484 


25 








S 


5 


1820 








Im 








255 





QSFG 











2296 



Table 6. As Table[6]but for the test set. 



Type 


E 


S 


Im 


QSFG 


E 


2033 


274 








S 


215 


8434 


95 





Im 





91 


1153 


1 


QSFG 


3 


1 


9 


11691 



Note. The confusion matrices for galaxies at z=0 and G= 18. 5 mag. 
Columns indicate the true class and rows the predicted ones. The la- 
bels E, S, Im, and QSFG correspond to early-type, spiral, irregular, and 
quenched star-forming galaxies, respectively. 



7. Classification & parametrization 

As in Paper I, we have used Support Vector Machine classifiers 
(SVMs) (C-classification) to determine spectral types of the sim- 
ulated spectra and regression SVMs (6-regression) when esti- 
mating their astrophysical parameters. For a more detailed de- 
scription of SVMs and references, we refer to Paper I, while a 
more general aspect of the Gaia classification scheme can be 
found in Bailer- Jones et al. ( 2008) . 

Throughout this section, we consider truncated spectra, re- 
taining only 77 of the 96 data points of the simulated BP 
and RP spectra corresponding to the wavelength range 321.43- 
998.02 nm and 613.78-1 130.25 nm for the two photometers, re- 
spectively. The data of the 19 pixels were excluded based on 
their very low values of SNR (<3) in the mean spectrum of 
galaxies with zero redshift and reddening (Fig[T2l). The rejection 
of these pixels from the data improves the performance of the 
SVM. Each of the remaining 77 pixels were standardized, that 
is scaled to have zero mean and unit variance across the whole 
sample, before being used by SVM. 

For all the classification and regression tests performed here, 
the samples used in each case were randomly split into two sets. 
The first one was used for training the SVMs and the second 
for testing their performance. For the tuning of SVMs, i.e., the 
selection of the optimal values of the internal parameters used 
by the method (C and g, and in the case of regression e also), we 
used two diff'erent schemes (four-fold cross validation or a fixed 
scheme), depending on the amount of data in the training set. In 
Table m we present the total number of spectra used in each test 
as well as the number of spectra in the training and test sets and 
the scheme used for the tuning of SVMs in each case. 

7.1. Galaxies without reddening at zero redshift at G=18.5 

This subset of the library comprises 28 885 galaxies produced 
by a random grid of parameters in PEGASE.2 with zero redden- 
ing and redshift at G= 18.5 mag. For these data, we performed 
both classification of the galaxy type and regression of the input 
parameters of PEGASE (i.e., the parameters included in the SER 
in each case and the infall timescale if present) as well as regres- 
sion of the most significant output parameters of the model. 



7.1.1. Classification 

Using the first subset presented in Table HI we trained the SVMs 
to classify the data into the four diff'erent galaxy types that it 
contains. The number of support vectors was 438 and the results 
for the training and test set are given in the Tables [5]|6l 

From these tables, we see that the results are very good. The 
number of missclassifications is 30 for the training set and 689 
for the testing set, corresponding to errors of 0.6% and 2.9%, 
respectively. 

Because of the large diff'erences between the two spectral 
libraries used in this work and the work presented in Paper I, 
the results are not directly comparable. More specifically, the 
library used here includes four galaxy types instead of the seven 
that were included in the previous version. In Paper I, we applied 
strict selection criteria to the spectra to determine the galaxy type 
as accurately as possible. This led to artificial gaps in the colour- 
colour space, which of course made the classification process 
easier. Finally, noise was not included in the data in Paper I, 
in contrast to what we wrote there, because of an error in our 
software. The library in the present paper - which now includes 
the correct values of noise - is also more comprehensive, so the 
results shown here supersede those in Paper I. All these factors 
have a large impact on the results with SVMs. 

7.1 .2. Regression of input parameters of PEGASE 

As described in the introduction, the models used to produce 
the second library (Table [B are not the same for each galaxy 
type and therefore the parameters included are diff'erent (Table 
[5]). For this reason, we decided to perform the parametrization 
independently for each galaxy type. In the future, we plan to per- 
form first a classification of the galaxy type and then to have four 
parametrizers (one for each type) that would be selected based 
on the type of galaxy extracted. Here we perform the regression 
tests assuming the classification was 100% correct. In the para- 
graphs that follow we discuss the results, presented in Fig. [13] 
and Table [71 for each type separately. 

i) Early-type galaxies. For the spectra of early-type galax- 
ies, we performed the regression for the pi and p2 parameters 
using the data described in Table |4] and obtained quite good re- 
sults. Comparing these results with those for models of other 
galaxy types we see that the estimation of the parameters for 
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Table 4. The number of spectra and the tuning scheme used for the various SVM tests. 



Test performed 


Total number 


Number of spectra 


Number of spectra 


Scheme used for the 




of spectra 


in the training set 


in the test set 


tuning of SVMs 


Galaxies without reddening at zero redshift 


Classification of galaxy types 


28 885 


4885 


24 000 


4-fold cross validation 


Regression of input ARs for early-type galaxy models 


2816 


704 


2112 


4-fold cross validation 


Regression of input ARs for spiral galaxy models 


10 569 


2643 


7926 


fixed (fix=4) 


Regression of input ARs for irregular galaxy models 


1500 


750 


750 


4-fold cross validation 


Regression of input ARs for quenched star 










forming galaxy models 


14 000 


3500 


10 500 


fixed (fix=4) 


Regression of output ARs for all galaxy models 


28885 


4885 


24000 


fixed (fix=4) 


Galaxies with redshift, without reddening 


Classification of galaxy types 


144 425 


14 440 


129 985 


fixed (fix=4) 


Regression of redshift for all galaxy models 


144 425 


14 440 


129 985 


fixed (fix=4) 


Galaxies with redshift and reddening 


Regression of redshift for all galaxy models 


144 425 


14 440 


129 985 


fixed (fix=4) 


Regression of reddening for all galaxy models 


144 425 


14 440 


129 985 


fixed (fix=4) 



these models is much more precise. This is possibly a result of 
the simpler model used to produce this type of galaxy. Using an 
exponential SF law characterized by only 2 parameters, we pro- 
duced spectra characterized by fewer degeneracies and therefore 
easier to parametrize. 

ii) Spiral galaxies. For spiral galaxies the parametrization 
was performed on three parameters (2 for SFR and 1 for infall 
timescale). The results of the regression are quite poor, espe- 
cially for the case of the pi parameter. This is partly because 
the SVM tuning was performed with a less detailed scheme, but 
mainly to the higher complexity of the model used to produce 
the spectra of this galaxy type. 

iii) Irregular galaxies. As in the case of spiral galaxies, the 
regression for spectra of irregular galaxies was performed for the 
same three parameters with SVM. Once again the results are not 
very good as can be seen from the large scatter in the resulting 
plots of the real against the predicted values. The results are sim- 
ilar to the case of spiral galaxies, which was expected since the 
models producing these two types of galaxies are the same and 
they differ only in the values of the model parameters. The re- 
sults between these two types seem to differ a lot only in the case 
of the infall timescale parameter, where the very sparse grid for 
higher values of infall timescale leads to a poor training of the 
SVMs. 

iv) Quenched star-forming galaxies. To perform regression 
for the four parameters of the models of quenched star-forming 
galaxies, we used the sample and the tuning scheme presented in 
TablelH For the parameters in common between the models used 
to produce both irregular and quenched star-forming galaxies, 
we can see that the performance of the SVMs in estimating them 
is very similar and very poor. The only parameter of this galaxy 
type extracted with quite good accuracy is /?3, which seems to 
have a large and direct impact on the galaxy spectra. 

7.1 .3. Regression of output parameters of PEGASE 

The regression of the galaxy parameters was also performed for 
the nine most significant output parameters of PEGASE as in 
Paper I. We expect these parameters to be more strongly and di- 
rectly related to the galaxy spectra and therefore easier to extract 
with the SVMs. Since these parameters are common for all the 
galaxy types, we performed regression for the whole sample of 
our simulated spectra. We can indeed estimate them more accu- 
rately (see Table in and Fig. [141) than the input parameters. The 
largest errors appear in the cases of SNIa and SNII rate as well 



as in the case of the remaining mass of the gas in the galaxy and 
the current SFR. For the two first parameters, this was expected 
since they are more related to the emission lines of the spectrum, 
to which Gaia observations will not be very sensitive because of 
their low resolution. For the case of the gas mass, the problem is 
caused mainly due to irregular and quenched star-forming galax- 
ies. Even though these types of galaxies are produced with very 
similar models to the ones used for spiral galaxies, the range of 
the input parameters used for them is very different. In particular, 
the values of the infall timescale are much higher in the case of 
starbust and irregular galaxies, especially compared to their age 
(see table [2]). This leads to degeneracies in the produced spectra. 
For example, galaxies that include a significant gas component 
and are dominated by a young stellar population might have the 
same amount of gas as galaxies with older stellar populations 
that are expected to have depleted their gas component, but may 
have ongoing gas infall. For the case of the current SFR, we ob- 
serve that for many galaxies with a zero SFR, the estimated value 
was quite high. This is mainly a problem caused by the quenched 
star-forming galaxies for which the SFR is currently zero but in 
many cases the SF stopped only very recently (e.g., 1 Myr ago). 
This scenario would lead to a spectrum that is very similar to 
that of an irregular galaxy (i.e., with a high current SFR) and it 
is therefore difficult for the SVMs to identify. 

For all the other parameters, the results are very accurate. 
These results indicate that we will be able to predict most of 
the astrophysical parameters that characterize the galaxy spectra 
with quite good accuracy, at least for G<18.5. 

7.2. Galaxies with redshift, without reddening at G=18.5 

We present both the classifications of galaxy type and the regres- 
sions of redshift for 144 425 simulated spectra of random red- 
shift, without reddening at G=18.5. In both cases, 14 440 spec- 
tra were used for training and 129 985 for testing. Only a coarse 
search for the optimal SVM hyperparameters was performed. 

In Tables [9l and [TOl (confusion matrices), we can see that the 
misclassifications numbered 454 for the training and 6813 for 
the testing set correspond to errors of 3.1% and 5.2% respec- 
tively. Comparing the results with those in Sect. 7.1.1 (2.9% for 
the testing set), we can see that although the error is small it is 
still two times larger. This result agrees with those in our previ- 
ous paper (Tsalmantza et al. 2007 ) for the tests of the first library, 
where we concluded that the redshift is a parameter that should 
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Early type (p1) Spirals (infall) QSFG (p1) 




Early type (p2) Irregulars (p1) QSFG (p2) 




Spirals (p1) Irregulars (p2) ^ QSFG (p3) 




Spirals (p2) Irregulars (infall) QSFG (infall) 




5000 10000 15000 20000 25000 30000 5000 10000 15000 20000 25000 30000 5000 10000 15000 20000 25000 30000 

real real real 



Fig. 13. Galaxy parameter estimation performance. For each of the input APs we plot the predicted vs. true AP values for the test 
set. The red line indicates the line of perfect estimation. The summary errors are given in Table [71 



be estimated in advance of the classification and the estimation 
of the other parameters. 

If we follow this classification scheme, the accuracy in the 
performance of regression for the redshift parameter will be very 
important to all the results we will extract from galaxy observa- 
tions with Gaia. Using the same subsets for training and testing 
SVMs and following the same procedure for tuning as in the 
classification we extracted the values of redshift. The results are 



very good and they are presented in Table [TT] and Fig. [151 This 
is very promising for the performance of the classification and 
parametrization of the Gaia galaxy observations. 

7.3. Galaxies with reddening, redshift at G=18.5 

We used the 144 425 galaxy spectra that include the eff'ects of 
reddening to perform a regression analysis of the Av and z pa- 
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Table 7. Summary of the performance of the SVM regression models for predicting the input APs of the galaxy models. 



Astrophysical parameter 


mean(real-predicted)/mean(real) 


sd(real-predicted)/mean(real) 


SVs 


Early-type galaxies 


Pi 


-0.008 


0.250 


674 


Pi 


-0.001 


0.076 


646 


Spiral galaxies 


Pi 


0.018 


0.384 


2440 


Pi 


0.010 


0.451 


1866 


infall timescale 


0.009 


0.234 


1786 


Irregular galaxies 


Pi 


0.004 


0.174 


647 


Pi 


0.030 


0.519 


736 


infall timescale 


0.089 


0.393 


655 


Quenched star-forming galaxies 


Pi 


0.014 


0.195 


3116 


Pi 


0.044 


0.479 


3127 


P3 


0.003 


0.335 


3077 


infall timescale 


0.038 


0.524 


3466 



Note. The sample is for zero redshift and interstellar extinction (Av). The second and third columns list the mean and RMS errors, respectively. 
The final column gives the number of support vectors in the SVM model. 

Table 8. Similar to Table [7] but for the prediction of the output APs of the galaxy models. 



Astrophysical parameter 


mean(real-predicted)/mean(real) 


sd(real-predicted)/mean(real) 


SVs 


mass to light ratio (M/L) 


4.79e-4 


3.97e-2 


4566 


normalized star formation rate (SFR) 


-7.69e-3 


3.05e-l 


4419 


metallicity of interstellar medium (Mim) 


4.52e-3 


4.99e-l 


3617 


metallicity of stars averaged on mass (Msm) 


2.73e-3 


6.98e-2 


4796 


normalized mass of gas (Mgas) 


-2.72e-3 


2.25e-l 


4499 


normalized mass in stars (Ms) 


-2.57e-3 


9.03e-2 


4583 


mean age of stars averaged on bolometric luminosity (Al) 


-5.49e-5 


8.25e-2 


4768 


normalized SNIa rate (SNIa) 


-1.17e-3 


1.85e-l 


2966 


normalized SNII rate (SNII) 


1.70e-2 


2.47e-l 


3660 



Table 11. Summary of the performance of the SVM regression models for predicting the z. 



Astrophysical parameter mean(real-predicted)/mean(real) sd(real-predicted)/mean(real) SVs 
z -3.61e-5 0070 2T60 



Note. The sample is for zero interstellar extinction (Ay) and five random values of redshift in the range 0-0.2. The second and third columns list 
the mean and RMS errors respectively. The final column gives the number of support vectors in the SVM model. 



redshift 




0.00 



0.05 



0.10 
real 



0.15 



0.20 



Fig. 15. Galaxy redshift estimation performance. We plot the 
predicted versus true z values for the test set. The red line in- 
dicates the line of perfect estimation. The summary errors are 
given in Table [TT] 



Table 9. Galaxy classification with the SVM for the training set. 



Table 10. As Table[6]but for the test set. 



Type 


E 


S 


Im 


QSFG 


E 


1131 


276 





1 


S 


41 


5238 


10 


3 


Im 





63 


701 


3 


QSFG 





53 


4 


6916 



Type 


E 


S 


Im 


QSFG 


E 


9343 


3319 





10 


S 


666 


46283 


455 


149 


Im 





1022 


5516 


195 


QSFG 





792 


205 


62030 



Note. The confusion matrices for galaxies at z > 0. Columns indicate 
the true class and the rows the predicted ones. The labels E, S, Im, and 
QSFG represent early-type, spiral, irregular, and quenched star-forming 
galaxies, respectively. 
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Fig. 14. Galaxy parameter estimation performance. For each of the output APs we plot the predicted vs. true AP values for the test 
set. The red line indicates the line of perfect estimation. The summary errors are given in Table [H 



rameters. The number of spectra used as a training and testing 
set, as well as the scheme used for the tuning of the SVMs, is 
given once again in Table (H 

As expected, the results of the regression analysis for the 
redshift parameter (Fig. [inland Table [TJ]) are worse than in the 
case where no reddening was included in the data, the problem 
becoming increasingly obvious towards high redshift. This indi- 
cates that we should estimate the reddening values before per- 
forming the regression of the redshift. 

The results of regression for the Av parameter are once again 
very good (Fig.[T7]and table[T2l). These results are promising, be- 
cause they indicate that we should be able to estimate the effects 
of extinction with reasonable accuracy before extracting the val- 
ues of the redshift and performing the rest of our classification 
scheme. We plan to compare the Av estimated in this way with 
that estimated from Gaia data of nearby stars. 

8. Discussion and conclusion 

The first results of the SVM classification and parametrization 
of the second library of synthetic galaxy spectra are very good 



redshift 




d 



0.00 0.05 0.10 0.15 0.20 

real 

Fig. 16. Galaxy redshift estimation performance. We plot the 
predicted versus true z values for the test set. The red line in- 
dicates the line of perfect estimation. The summary errors are 
given in Table [T2l 
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Table 12. As table [TT] for the z and Ay parameters for galaxy spectra that include redshift and reddening. 



Astrophysical parameter 


mean(real-predicted)/mean(real) 


sd(real-predicted)/mean(real) 


SVs 


z 


0.012 


0.190 


7291 


Av 


0.002 


0.100 


5478 



Av 




2 4 6 8 Yo 

real 



Fig. 17. Galactic-interstellar extinction estimation performance. 
We plot the predicted versus true Av values for the test set. The 
red line indicates the line of perfect estimation. The summary 
errors are given in Table fT2l 



for the classification of galaxy types and regression of most 
of the output parameters of the model, redshift, and redden- 
ing. However, we emphasize that although the regression results 
seem to be quite accurate for most output parameters estimated 
here (e.g., current SFR, metallicity, stellar mass), they might 
include large errors because of discrepancies between models 
and reality. All the parameter values used here to train the SVM 
classifiers are model dependent. The models used are a simpli- 
fication of the complex structure and evolution of galaxies and 
therefore cannot lead to accurate predictions of their parameters. 
These models also are unable to simulate the complete range 
of detail occuring in the universe. For these reasons, the results 
presented here should be used for statistical studies of the main 
galaxy properties in the local universe and not as absolute values 
for each individual galaxy. In contrast to the output parameters, 
the results are very poor for the majority of the astrophysical 
parameters used to produce each type of galaxy, implying that 
Gaia will not be able to provide accurate measurements for input 
galaxy parameters. However, this is something that we should in- 
vestigate further (i.e., using diff'erent parametrization methods) 
to check whether these results can be improved. 

We have used the PEGASE.2 galaxy evolution model and 
observational data from SDSS to solve problems with our first 
library and extend the library to cover the large majority of ob- 
servational data parameter space. In this way, an extended li- 
brary of 28 885 synthetic galaxy spectra was created at zero red- 
shift and reproduced in addition for 4 random values of redshift. 
The whole library was produced for a random grid of the as- 
trophysical parameters used by PEGASE.2 models. The models 
used in PEGASE.2 to create early-type galaxies was changed 
and an exponential model for their SFR was adopted. Models 
for quenched star-forming galaxies which were not included in 
the first library were also added. In the case of irregular and spi- 
ral galaxies we extended the range of input parameter values. 
The resulting library includes four general Hubble types instead 



of seven that were included in the first library and covers almost 
all the variance in the SDSS photometric observations. To inves- 
tigate the range of input parameters in the models for each type 
we made use of photometric data (SDSS and Paturel et al. 11997b . 
Even though the comparison of our library with colour observa- 
tions provides good results, it is possible that some combinations 
or values of input parameters produce spectra that do not corre- 
spond to realistic galaxy spectra of those types. As an exam- 
ple, we propose that values of the pi parameter of the early-type 
galaxies as high as 30 Gyr might lead to unrealistic spectra of 
early-type galaxies. These values were kept because of the good 
agreement with the photometric observations but they might be 
excluded or characterized as spectra of a diff'erent galaxy type 
in future versions of our library if they are found not to match 
real spectra. For this purpose, we intend to compare the second 
library of synthetic spectra of galaxies with a larger sample of 
observational spectra from SDSS. 

The second library produced here was compared with other 
observations, both photometric (Paturel et al. 11997b and spec- 
troscopic (Kennicutt 1992), and found to be in good agreement 
with them. The only problem appears in the case of quenched 
star-forming galaxies where the synthetic spectra of this type do 
not seem to fit very well any type of observed spectra in the 
Kennicutt Atlas (Fig. lA.5l) . Additionally, even though the SDSS 
colours of this type of galaxy are very similar to those of star- 
burst galaxies, the comparison of the spectra of these two types 
showed that they do not reproduce the strong emission lines 
present in the observational data. To solve this problem we in- 
tend to produce starburst galaxies using a new version of the 
PEGASE model that includes new models for all the mecha- 
nisms that are important to this type of galaxy. In future ver- 
sions of our synthetic library, we intend to investigate the role 
of a wider range of astrophysical parameters in the models used 
in PEGASE. For example, we need to investigate the range of 
galaxy age parameter, which has a great impact on the output 
spectra. Here, it was simply kept constant at 9 or 13 Gyr depend- 
ing on the galaxy type. 

For the task of classification and parametrization of un- 
resolved galaxies with Gaia, we will also construct a semi- 
empirical library of galaxy spectra. This library will include ob- 
servational spectra from SDSS, which will be extended to the 
wavelength range of Gaia by our synthetic spectra. The advan- 
tage of this library is that it provides a set of real observed spec- 
tra, with the corresponding astrophysical parameters, as defined 
by comparing each spectrum with the synthetic spectra. A library 
of observed galaxy spectra combined with the already produced 
synthetic libraries will check and improve our classification sys- 
tem and test the reliability and completeness of our libraries. 
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Appendix A: Comparison of the second library with 
Kennicutt's atlas 

We present the results of the -fitting of the galaxy spectra in 
Kennicutt 's atlas (black lines) with the ones in our library (red 
lines). The results are presented in 5 blocks, each one containing 
the real spectra of a particular galaxy type. The synthetic spec- 
trum plotted over the real one corresponds to the one with the 
smallest ;^^-diff'erence. For every spectrum, we present both re- 
sults extracted when the -fitting was performed by excluding 
or including the areas with the strongest emission lines. The ver- 
tical green lines indicate those spectral areas (from the beginning 
until the first line, between the two lines in the middle and from 
the last line until the end of the spectrum). When the strongest 
emission lines were included in the comparison, the resolution 
in those three areas was decreased to only one point for both sets 
of spectra. 
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Fig.A.l. Results of the -fitting of early-type galaxy spectra in the Kennicutt's Atlas (black lines) with our library (red lines). 
The results are presented in rows. Every two plots give the results for one observed galaxy when the;^^-fitting was performed by 
excluding or including the areas with the strongest emission lines respectively. The y-axis corresponds to the normalized fluxes, 
while the x-axis presents the wavelengths and extends from 350 to 700 nm. The galaxies presented here are: first row: NGC3379, 
NGC3245, NGC4472, and NGC3941, second row: NGC4648, NGC4889, NGC4262, and NGC5866. 




Fig.A.2. The same with Fig (Al] for the spiral galaxies in Kennicutt's atlas, first row: NGC1357, NGC2276, NGC2775, and 
NGC4775, second row: NGC5248, NGC3368, NGC3623, and NGC6217, third row: NGC1832, NGC2903, NGC3147, and 
NGC4631, fourth row: NGC3627, NGC6181, NGC4750, and NGC6643. 




Fig.A.3. The same with Fig lA.ll for the magelanic type irregular galaxies in Kennicutt's atlas. Here we present: NGC1569, 
NGC4485, NGC4449, and NGC4670. 




Fig. A.4. The same with Fig|Al]for the 10 irregular galaxies in Kennicutt's atlas. Here we present: NGC3034, NGC5 195, NGC3077, 
and NGC6240. 




Fig. A.5. The same with Fig lA.ll for the quenched star-forming galaxies with global starbursts in Kennicutt's atlas. Here we present: 
NGC3310, NGC6052, NGC3690, and UGC6697. 
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Fig. A.6. The same with Fig lA. II for the nuclear starburst galaxies in Kennicutt's atlas. Here we present: NGC2798, NGC3471, 
NGC5996,and NGC7714. 



